Architectures for Speech Synthesis from Human Voice Audio Database

نویسندگان

Jiping Sun

Feng Zhang

چکیده

This paper describes some of the ongoing work carried out within the NLP group of Otago University for speech synthesis from diphone audio databases which are prepared from human voice recording. A speech synthesis system based on such databases claims to have human level performance provided enough descriptions are given to extract and manipulate voice data. Based on the software provided by the same project developing the databases [1][2][3], many control parameters are accommodated for preparing for a text file as a directive for speech synthesis. Thus, for a speech system with detailed phonetic descriptions stored in dictionary entries, synthesizing speech for words is a fairly straightforward task. Our work however aims at building a text-to-speech transducer without a dictionary, so that no limitations can be put to the kind of words it can process. Instead, we use a neural network as a learning system for storing text-to-speech correspondence knowledge for the task. In this paper we will introduce the training parameters and encoding method used for high-quality speech synthesizing and a text-to-speech system as a whole. In particular, we will introduce work currently being carried out for synthesizing speech of French words, based on one of the voice databases provided by the MBROLA project [3].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Emotional Speech Synthesis

Emotional speech synthesis is an important part of the puzzle on the long way to human-like artificial human-machine interaction. During the way, lots of stations like emotional audio messages or believable characters in gaming will be reached. This chapter discusses technical aspects of emotional speech synthesis, shows practical application and highlights new developments concerning the reali...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Improvements to a Sample-Concatenation Based Singing Voice Synthesizer

This paper describes recent improvements to our singing voice synthesizer based on concatenation and transformation of audio samples using spectral models. Improvements include firstly robust automation of previous singer database creation process, a lengthy and tedious task which involved recording scripts generation, studio sessions, audio editing, spectral analysis, and phonetic based segmen...

متن کامل

Audio Morphing

Approach: There are two variants of our work: inter-voice morphing and intra-voice morphing. In the intra-voice morphing scenario, a single person’s voice is recorded uttering a wide range of utterances. The speaker’s phones are then morphed in time to generate new utterances of the speaker. We note that intra-voice morphing addresses the same problem that concatenative speech synthesis algorit...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Architectures for Speech Synthesis from Human Voice Audio Database

نویسندگان

چکیده

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Emotional Speech Synthesis

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

Improvements to a Sample-Concatenation Based Singing Voice Synthesizer

Audio Morphing

عنوان ژورنال:

اشتراک گذاری